AITopics | primary study

Collaborating Authors

primary study

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ACautionary Tale on Integrating Studies with Disparate Outcome Measures for Causal Inference

Neural Information Processing SystemsJun-15-2026, 19:57:52 GMT

artificial intelligence, assumption, machine learning, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.69)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.67)

Add feedback

Fused Multinomial Logistic Regression Utilizing Summary-Level External Machine-learning Information

Dai, Chi-Shian, Shao, Jun

arXiv.org Machine LearningApr-7-2026

In many modern applications, a carefully designed primary study provides individual-level data for interpretable modeling, while summary-level external information is available through black-box, efficient, and nonparametric machine-learning predictions. Although summary-level external information has been studied in the data integration literature, there is limited methodology for leveraging external nonparametric machine-learning predictions to improve statistical inference in the primary study. We propose a general empirical-likelihood framework that incorporates external predictions through moment constraints. An advantage of nonparametric machine-learning prediction is that it induces a rich class of valid moment restrictions that remain robust to covariate shift under a mild overlap condition without requiring explicit density-ratio modeling. We focus on multinomial logistic regression as the primary model and address common data-quality issues in external sources, including coarsened outcomes, partially observed covariates, covariate shift, and heterogeneity in generating mechanisms known as concept shift. We establish large-sample properties of the resulting fused estimator, including consistency and asymptotic normality under regularity conditions. Moreover, we provide mild sufficient conditions under which incorporating external predictions delivers a strict efficiency gain relative to the primary-only estimator. Simulation studies and an application to the National Health and Nutrition Examination Survey on multiclass blood-pressure classification.

artificial intelligence, fmle 0, machine learning, (18 more...)

arXiv.org Machine Learning

2604.03939

Country:

Asia > Taiwan (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)

Add feedback

Automatic selection of primary studies in systematic reviews with evolutionary rule-based classification

de la Torre-López, José, Ramírez, Aurora, Romero, José Raúl

arXiv.org Artificial IntelligenceSep-30-2025

Conducting a SLR is especially useful when starting a new line of research, as it involves a detailed analysis of the research topic supported by the appropriate references. This type of secondary study should be conducted following a strict protocol to ensure quality and allow replication (Booth et al., 2016). Within the SLR process, manual and automated searches are performed to identify research papers related to the topic under review (Kitchenham and Charters, 2007). Therefore, the selection of primary studies, i.e., papers of sufficient quality and truly relevant to the topic, is one of the most important steps. It is also a time-consuming task due to potentially large search results if the queries are too open-ended or the research topic is too broad. Recently, artificial intelligence (AI) has emerged as a way to assist researchers in this task, as well as in other stages of the SLR process (de la Torre-López et al., 2023). The topic has gained even more relevance since the appearance of Large Language Models (LLMs) (Han et al., 2024; Galli et al., 2025). LLMs have expanded the capabilities of AI-assisted SLRs with the ability to extract information from papers, synthesise their findings and generate texts to accelerate SLR reporting.

evolutionary algorithm, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2509.23981

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine (0.68)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.94)
(2 more...)

Add feedback

Incorporating External Controls for Estimating the Average Treatment Effect on the Treated with High-Dimensional Data: Retaining Double Robustness and Ensuring Double Safety

Dai, Chi-Shian, Ying, Chao, Ning, Yang, Zhao, Jiwei

arXiv.org Machine LearningSep-26-2025

Randomized controlled trials (RCTs) are widely regarded as the gold standard for causal inference in biomedical research. For instance, when estimating the average treatment effect on the treated (ATT), a doubly robust estimation procedure can be applied, requiring either the propensity score model or the control outcome model to be correctly specified. In this paper, we address scenarios where external control data, often with a much larger sample size, are available. Such data are typically easier to obtain from historical records or third-party sources. However, we find that incorporating external controls into the standard doubly robust estimator for ATT may paradoxically result in reduced efficiency compared to using the estimator without external controls. This counterintuitive outcome suggests that the naive incorporation of external controls could be detrimental to estimation efficiency. To resolve this issue, we propose a novel doubly robust estimator that guarantees higher efficiency than the standard approach without external controls, even under model misspecification. When all models are correctly specified, this estimator aligns with the standard doubly robust estimator that incorporates external controls and achieves semiparametric efficiency. The asymptotic theory developed in this work applies to high-dimensional confounder settings, which are increasingly common with the growing prevalence of electronic health record data. We demonstrate the effectiveness of our methodology through extensive simulation studies and a real-world data application.

eff, estimator, external control, (16 more...)

arXiv.org Machine Learning

2509.20586

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.93)
Health & Medicine > Health Care Technology > Medical Record (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

AI Simulation by Digital Twins: Systematic Survey, Reference Framework, and Mapping to a Standardized Architecture

Liu, Xiaoran, David, Istvan

arXiv.org Artificial IntelligenceSep-1-2025

Insufficient data volume and quality are particularly pressing challenges in the adoption of modern subsymbolic AI. To alleviate these challenges, AI simulation uses virtual training environments in which AI agents can be safely and efficiently developed with simulated, synthetic data. Digital twins open new avenues in AI simulation, as these high-fidelity virtual replicas of physical systems are equipped with state-of-the-art simulators and the ability to further interact with the physical system for additional data collection. In this article, we report on our systematic survey of digital twin-enabled AI simulation. By analyzing 22 primary studies, we identify technological trends and derive a reference framework to situate digital twins and AI components. Based on our findings, we derive a reference framework and provide architectural guidelines by mapping it onto the ISO 23247 reference architecture for digital twins. Finally, we identify challenges and research opportunities for prospective researchers.

machine learning, reinforcement learning, simulation, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10270-025-01306-0

2506.0658

Country:

Europe (0.92)
North America > Canada (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Setting > Online (0.86)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(6 more...)

Add feedback

Towards a unified framework for programming paradigms: A systematic review of classification formalisms and methodological foundations

Vandeloise, Mikel

arXiv.org Artificial IntelligenceAug-4-2025

The rise of multi-paradigm languages challenges traditional classification methods, leading to practical software engineering issues like interoperability defects. This systematic literature review (SLR) maps the formal foundations of programming paradigms. Our objective is twofold: (1) to assess the state of the art of classification formalisms and their limitations, and (2) to identify the conceptual primitives and mathematical frameworks for a more powerful, reconstructive approach. Based on a synthesis of 74 primary studies, we find that existing taxonomies lack conceptual granularity, a unified formal basis, and struggle with hybrid languages. In response, our analysis reveals a strong convergence toward a compositional reconstruction of paradigms. This approach identifies a minimal set of orthogonal, atomic primitives and leverages mathematical frameworks, predominantly Type theory, Category theory and Unifying Theories of Programming (UTP), to formally guarantee their compositional properties. We conclude that the literature reflects a significant intellectual shift away from classification towards these promising formal, reconstructive frameworks. This review provides a map of this evolution and proposes a research agenda for their unification.

logic & formal reasoning, paradigm, programming language, (22 more...)

arXiv.org Artificial Intelligence

2508.00534

Country:

Europe > United Kingdom > England (0.46)
North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Software Engineering (1.00)
Information Technology > Information Management (1.00)
(2 more...)

Add feedback

Learning Software Bug Reports: A Systematic Literature Review

Long, Guoming, Gong, Jingzhi, Fang, Hui, Chen, Tao

arXiv.org Artificial IntelligenceJul-22-2025

The recent advancement of artificial intelligence, especially machine learning (ML), has significantly impacted software engineering research, including bug report analysis. ML aims to automate the understanding, extraction, and correlation of information from bug reports. Despite its growing importance, there has been no comprehensive review in this area. In this paper, we present a systematic literature review covering 1,825 papers, selecting 204 for detailed analysis. We derive seven key findings: 1) Extensive use of CNN, LSTM, and $k$NN for bug report analysis, with advanced models like BERT underutilized due to their complexity. 2) Word2Vec and TF-IDF are popular for feature representation, with a rise in deep learning approaches. 3) Stop word removal is the most common preprocessing, with structural methods rising after 2020. 4) Eclipse and Mozilla are the most frequently evaluated software projects. 5) Bug categorization is the most common task, followed by bug localization and severity prediction. 6) There is increasing attention on specific bugs like non-functional and performance bugs. 7) Common evaluation metrics are F1-score, Recall, Precision, and Accuracy, with $k$-fold cross-validation preferred for model evaluation. 8) Many studies lack robust statistical tests. We also identify six promising future research directions to provide useful insights for practitioners.

artificial intelligence, bug report analysis, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2507.04422

Country:

North America > United States (1.00)
Asia > China (1.00)
Europe > United Kingdom > England (0.28)
North America > Canada > Ontario (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.93)

Industry:

Information Technology > Security & Privacy (0.92)
Education > Educational Technology > Educational Software > Computer Based Training (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

The Impact of LLM-Assistants on Software Developer Productivity: A Systematic Literature Review

Mohamed, Amr, Assi, Maram, Guizani, Mariam

arXiv.org Artificial IntelligenceJul-8-2025

Large language model assistants (LLM-assistants) present new opportunities to transform software development. Developers are increasingly adopting these tools across tasks, including coding, testing, debugging, documentation, and design. Yet, despite growing interest, there is no synthesis of how LLM-assistants affect software developer productivity. In this paper, we present a systematic literature review of 37 peer-reviewed studies published between January 2014 and December 2024 that examine this impact. Our analysis reveals that LLM-assistants offer both considerable benefits and critical risks. Commonly reported gains include minimized code search, accelerated development, and the automation of trivial and repetitive tasks. However, studies also highlight concerns around cognitive offloading, reduced team collaboration, and inconsistent effects on code quality. While the majority of studies (92%) adopt a multi-dimensional perspective by examining at least two SPACE dimensions, reflecting increased awareness of the complexity of developer productivity, only 14% extend beyond three dimensions, indicating substantial room for more integrated evaluations. Satisfaction, Performance, and Efficiency are the most frequently investigated dimensions, whereas Communication and Activity remain underexplored. Most studies are exploratory (64%) and methodologically diverse, but lack longitudinal and team-based evaluations. This review surfaces key research gaps and provides recommendations for future research and practice. All artifacts associated with this study are publicly available at https://zenodo.org/records/15788502.

large language model, llm-assistant, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.03156

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Ontario > Kingston (0.04)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study > Negative Result (0.46)

Industry:

Education (1.00)
Information Technology > Software (0.73)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

A Cautionary Tale on Integrating Studies with Disparate Outcome Measures for Causal Inference

Parikh, Harsh, Nguyen, Trang Quynh, Stuart, Elizabeth A., Rudolph, Kara E., Miles, Caleb H.

arXiv.org Artificial IntelligenceMay-19-2025

Data integration approaches are increasingly used to enhance the efficiency and generalizability of studies. However, a key limitation of these methods is the assumption that outcome measures are identical across datasets -- an assumption that often does not hold in practice. Consider the following opioid use disorder (OUD) studies: the XBOT trial and the POAT study, both evaluating the effect of medications for OUD on withdrawal symptom severity (not the primary outcome of either trial). While XBOT measures withdrawal severity using the subjective opiate withdrawal scale, POAT uses the clinical opiate withdrawal scale. We analyze this realistic yet challenging setting where outcome measures differ across studies and where neither study records both types of outcomes. Our paper studies whether and when integrating studies with disparate outcome measures leads to efficiency gains. We introduce three sets of assumptions -- with varying degrees of strength -- linking both outcome measures. Our theoretical and empirical results highlight a cautionary tale: integration can improve asymptotic efficiency only under the strongest assumption linking the outcomes. However, misspecification of this assumption leads to bias. In contrast, a milder assumption may yield finite-sample efficiency gains, yet these benefits diminish as sample size increases. We illustrate these trade-offs via a case study integrating the XBOT and POAT datasets to estimate the comparative effect of two medications for opioid use disorder on withdrawal symptoms. By systematically varying the assumptions linking the SOW and COW scales, we show potential efficiency gains and the risks of bias. Our findings emphasize the need for careful assumption selection when fusing datasets with differing outcome measures, offering guidance for researchers navigating this common challenge in modern data integration.

artificial intelligence, assumption, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.11014

Country: North America > United States (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.69)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.87)

Add feedback

Aggregating empirical evidence from data strategy studies: a case on model quantization

del Rey, Santiago, Santos, Paulo Sérgio Medeiros dos, Travassos, Guilherme Horta, Franch, Xavier, Martínez-Fernández, Silverio

arXiv.org Artificial IntelligenceMay-5-2025

--Background: As empirical software engineering evolves, more studies adopt data strategies--approaches that investigate digital artifacts such as models, source code, or system logs rather than relying on human subjects. Synthesizing results from such studies introduces new methodological challenges. Aims: This study assesses the effects of model quantization on correctness and resource efficiency in deep learning (DL) systems. Additionally, it explores the methodological implications of aggregating evidence from empirical studies that adopt data strategies. Method: We conducted a research synthesis of six primary studies that empirically evaluate model quantization. We applied the Structured Synthesis Method (SSM) to aggregate the findings, which combines qualitative and quantitative evidence through diagrammatic modeling. A total of 19 evidence models were extracted and aggregated. Results: The aggregated evidence indicates that model quantization weakly negatively affects correctness metrics while consistently improving resource efficiency metrics, including storage size, inference latency, and GPU energy consumption--a manageable trade-off for many DL deployment contexts. Evidence across quantization techniques remains fragmented, underscoring the need for more focused empirical studies per technique. Conclusions: Model quantization offers substantial efficiency benefits with minor trade-offs in correctness, making it a suitable optimization strategy for resource-constrained environments. This study also demonstrates the feasibility of using SSM to synthesize findings from data strategy-based research. Software engineering (SE) increasingly relies on data strategy studies [1] to understand and improve software development and deployment practices. Data strategies refer to "empirical studies that rely primarily on archival, generated or simulated data" [1], using a wide range of specific methods, including experiments and data mining studies. It is also partially funded by the Joan Or o pre-doctoral support program (BDNS 657443), co-funded by the European Union. Although these studies provide valuable information, they remain largely disconnected, with findings often limited to specific contexts and lacking broader theoretical integration. Therefore, the SE field struggles with few theories and needs more structured syntheses of existing research to guide future advancements.

data mining, machine learning, quantization, (17 more...)

arXiv.org Artificial Intelligence

2505.00816

Country:

Europe (0.48)
South America > Brazil (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback